Hosted by Sourceforge TWiki > DLibrary > WebHome (book view) TWiki webs:
Main | TWiki | Know | Sandbox
DLibrary . { Changes | Index | Search | Go }
Search: \.*

Topics in DLibrary web: Changed: GMT Changed by:

DLibraryDesign  

18 Nov 2003 - 18:09 - NEW   RjHonicky

Here is a simpler ToDo list.

See ProxyStorageInterface for details on how we interface between the Proxy server and the RemoteRepository.

The list of MatulyasQuestions and RJs answers has moved.

 


DiSC  

10 Mar 2004 - 01:28 - NEW   RjHonicky

DiSC is the new name for the DLibrary project -- RjHonicky - 10 Mar 2004

 


DocumentSources  

10 Mar 2004 - 01:23 - NEW   RjHonicky

Here are some starting points for online documents which might belong in a DiSC repository

-- RjHonicky - 10 Mar 2004

 


DownloadedPapers  

14 Dec 2003 - 02:28 - r1.3   RjHonicky

Attach papers here by clicking on the "Attach" link at the bottom

 


InterestingIdeas  

05 Mar 2004 - 14:09 - r1.5   RjHonicky

 


KnownBugs  

09 Dec 2003 - 08:14 - r1.5   RjHonicky

 


MatulyasQuestions  

18 Nov 2003 - 18:09 - NEW   RjHonicky

Data Collection

Data Interpretation

Architecture

Proxy to Internet Data Distribution mechanism Indexing mechanism Searching Mechanism Proxy to user interface User Interface

Issues

  1. For the hash algorithm, we should use the one I just published: I’ll send it
  2. For cache replacement, LRU

-- RjHonicky - 18 Nov 2003

 


ProjectDocs  

03 May 2004 - 01:00 - r1.5   RjHonicky

We are currently designing experiments which utilize DLibrary. Matulya wrote a quick word document outline, which I have ported into cvs in the documentation module. Matulya's version is here for reference.

RunningDiSC describes how to check out and run everything

-- RjHonicky - 02 Dec 2003

 


ProjectLinks  

14 Dec 2003 - 02:18 - r1.7   RjHonicky

 


ProxyStorageInterface  

22 Nov 2003 - 19:03 - r1.2   RjHonicky

The proxy server (RabbIT) currently stores meta-data and data seperately: there is a single file which contains cache data, plus the header of each of the request which resulted in the object being cached. This file is managed by the NCache and NCacheEntry objects. This is great, since it provides a single class through which all the metadata.

Data, however, is managed by several classes since it is streamed, rather than read and then written (a good design for a proxy cache, but which complicates things for us). The relevant classes are

Both interfaces (lucene and rabbit) want to deal with streams, so I have defined a set of classes that allow you to pass references to streams across process boundaries. RemoteIO, I define a detailed interface for remote streams. The code is in CVS under the package remoteio.

With remote streams, the interface to the RemoteRepository works as follows

 


RankingExperiment  

07 Dec 2003 - 01:18 - NEW   RjHonicky

In the RankingExperiment, the cache is first primed using a trace, or perhaps simply from usage over a period. A set of users are then given a list of search queries to perform. These queries can be of two types

  1. Do the following exact query (eg "president bush incompetent")
  2. Run a query to find pages discussing something (eg. the incompetence of "president" bush)
The second type of query allows the user to select the semantic meaning of the query, instead of forcing them to interpret or guess the query's meaning. If the semantic meaning of the query were supplied with the exact query, the results could be skewed by an inexact match between the query and the meaning provided. Both types of queries are therefore supplied, and anylized seperately.

The query is then performed against both the cache and Google.com, retreiving up to some maximum number of documents each. These results are randomly mixed and merged, and displayed for the user. The url, title, and summary of the document are not displayed, so that the user is not biased by the quality of the summary, title extraction, url cleanup etc, but rather only considers the ranking of the documents.

The user is asked to chose the top five documents in the result set. The user browses all of the results and ranks the top five.

For each of the top five documents chosen by the user, if the documents is one of Google's top five, the google gets a point. If the document is one of the cache's top five, the cache gets a point.

Google's score is then subtracted from the cache's score. A mean and standard deviation is caculated for the difference, over a large set of queries. By gathering rankings on the same query from several users, we can also calculate a standard error for our results.

This experiment can be repeated under different conditions:

 


RemoteIO  

18 Nov 2003 - 18:12 - NEW   RjHonicky

RemoteOutputStreamProxy, RemoteInputStreamProxy
-----------------------  ----------------------

 + RemoteOutputStreamProxy(Integer streamKey, 
                 URL serverUrl):RemoteOutputStreamProxy
  
   creates a Serlializable proxy object which can be passed back
   from a remote call
 
 - serverURL:URL 
 - streamKey:Integer
 - stream:RemoteOutputStreamServer
   
   if serialize must be implemented, then it will not transfer the
   stream object: it will be null when deserialized, so that the
   client is forced to reopen the server

 o implements the OutputStreamInterface
 o caches writes to do them a block at a time
 o flush will first write and then call flush remotely
 o all operations open the stream on demand


RemoteOutputStreamServer
------------------------

 - streamMap: HashMap<OutputStreams>
 = addStream(OutputStream stream):RemoteOutputStreamProxy
 
 + write(Integer streamKey, toWrite:byte[]):void
   
   writes toWrite.length bytes to the remote stream 

 + flush(Integer streamKey):void

   flushes the remote stream

 + close(Integer streamKey):void

   closes the remote stream and removes it from streamMap



RemoteInputStreamServer
-----------------------

 - streamMap: HashMap<elements are InputStreams>
 = addStream(InputStream stream):RemoteInputStreamProxy

 + read(Integer streamKey, int maxLength):byte[]

   reads up to maxLength bytes from the remote stream

 + close(Integer streamKey): void

   closes the remote stream

 + mark(Integer streamKey, int readLimit): void

   Marks the current position in the remote stream.

 + markSupported(Integer streamKey):boolean

   Tests if this input stream supports the mark and  reset methods.

 + reset(Integer streamKey): void
   
   Repositions this stream to the position at the time the mark method
   was last called on this input stream.

 + skip(Integer streamKey, long):long
    
   Skips over and discards n bytes of data from this input stream.

-- RjHonicky - 18 Nov 2003

 


RunningDiSC  

03 May 2004 - 01:18 - NEW   RjHonicky

This is a page which describes for developers how to get things up and running.


First check out everything:

export CVS_RSH=ssh
cvs -z3 -d:ext:developername@cvs.sourceforge.net:/cvsroot/dlibrary co dlibrary
cvs -z3 -d:ext:developername@cvs.sourceforge.net:/cvsroot/dlibrary co RabbIT2
cvs -z3 -d:ext:developername@cvs.sourceforge.net:/cvsroot/dlibrary co documentation

next set up your shell:

cd dlibrary
. setClasspath.sh

for sh, bash, etc., or 

source setClasspath.csh

for csh, tcsh, etc.

Now build everything:

make

Note, you must have make installed, and this probably only works on Unixish systems

cd ../RabbIT2

./jmake

Next make a directory for the repository:

mkdir /var/tmp/repository
ln -s /var/tmp/repository/rabbit.conf conf/rabbit.conf

Now you're ready to start up DiSC.  Make sure that you are still in the RabbIT2 directory, and do the following:

java dlibrary.RemoteRepositoryImpl /var/tmp/repository localhost 2345&
java rabbit.proxy.Proxy

You will get some debugging output on the screen for each page you download.

-- RjHonicky - 03 May 2004

 


SearchInterface  

07 Dec 2003 - 00:49 - NEW   RjHonicky

The search interface can be accessed by going to the following url: http://localhost:9666/FileSender/search/search_cache.html

You must log in to use this interface: add yourself to the conf/users file (the format is username:password), or you can use the default user RabbIT:RabbIT.

The search interface currently returns a linked title, plus a summary. The summary that the HTML parser currently generates is crap: it just gives the first 100 or so tokens in the document. A better summarizer would be very handy, or even better, the context of the hit, like google does.

I have also added a few pages to perform the RankingExperiment. The search page is at http://localhost:9666/FileSender/search/search_both.html

 


TWikiGuest  

18 Nov 2003 - 07:35 - NEW   TWikiGuest

A guest of this TWiki web, not unlike yourself. You can leave your trace behind you, just add your name in TWikiRegistration and create your own page.

Personal Preferences (details in TWikiVariables)

Related topics

 


TWikiUsers  

14 Nov 2003 - 01:33 - r1.16   RjHonicky

List of TWiki users

Please take the time and add yourself to the list. To do that fill out the form in TWikiRegistration. This will create an account for you which allows you to edit topics.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Note: Do not edit this topic to add a user, use TWikiRegistration instead.

Related topics: OfficeLocations?, TWikiGroups?

 


ToDo  

01 Dec 2003 - 06:38 - r1.4   RjHonicky

See also: KnownBugs, InterestingIdeas

 


WebChanges  

16 Aug 2001 - 19:56 - r1.2   PeterThoeny

Topics in DLibrary web: Changed: now 13:16 GMT Changed by:
RunningDiSC 03 May 2004 - 01:18 - NEW RjHonicky
This is a page which describes for developers how to get things up and running. First check out everything: export CVS RSH ssh cvs z3 d:ext:developername@cvs.sourceforge ...  
ProjectDocs 03 May 2004 - 01:00 - r1.5 RjHonicky
We are currently designing experiments which utilize DLibrary. Matulya wrote a quick word document outline, which I have ported into cvs in the documentation module ...  
DiSC 10 Mar 2004 - 01:28 - NEW RjHonicky
DiSC is the new name for the DLibrary project Main.RjHonicky 10 Mar 2004  
DocumentSources 10 Mar 2004 - 01:23 - NEW RjHonicky
Here are some starting points for online documents which might belong in a DiSC repository project guttenberg online books citeseer computer science articles looksmart ...  
WebHome 10 Mar 2004 - 01:20 - r1.17 RjHonicky
Folks, if you've never used a wiki, learn about wiki webs on the TWiki.GoodStyle page. News: Main.RjHonicky 08 Dec 2003 : Fixed a locking problem (see KnownBugs ...  
InterestingIdeas 05 Mar 2004 - 14:09 - r1.5 RjHonicky
How do laptops join a network? Only cache own requests, but can search the entire repository They can also contribute their documents (which they supposedly got while ...  
DownloadedPapers 14 Dec 2003 - 02:28 - r1.3 RjHonicky
Attach papers here by clicking on the "Attach" link at the bottom  
ProjectLinks 14 Dec 2003 - 02:18 - r1.7 RjHonicky
The sourceforge project page http://sourceforge.net/projects/dlibrary/ RabbIT: a transcoding proxy http://www.khelekore.org/rabbit/readme.shtml Jakarta Lucene, an ...  
KnownBugs 09 Dec 2003 - 08:14 - r1.5 RjHonicky
all (major) know bugs seem to have been resolved!  
RankingExperiment 07 Dec 2003 - 01:18 - NEW RjHonicky
In the RankingExperiment, the cache is first primed using a trace, or perhaps simply from usage over a period. A set of users are then given a list of search queries ...  
SearchInterface 07 Dec 2003 - 00:49 - NEW RjHonicky
The search interface can be accessed by going to the following url: http://localhost:9666/FileSender/search/search cache.html You must log in to use this interface ...  
ToDo 01 Dec 2003 - 06:38 - r1.4 RjHonicky
See also: KnownBugs, InterestingIdeas Add in support for PDF via plugins provided by lucence (there is an interface in RemoteRepositoryImpl for adding content types ...  
ProxyStorageInterface 22 Nov 2003 - 19:03 - r1.2 RjHonicky
The proxy server (RabbIT) currently stores meta-data and data seperately: there is a single file which contains cache data, plus the header of each of the request ...  
RemoteIO 18 Nov 2003 - 18:12 - NEW RjHonicky
RemoteOutputStreamProxy, RemoteInputStreamProxy RemoteOutputStreamProxy(Integer streamKey, URL serverUrl):RemoteOutputStreamProxy creates a Serlializable proxy object ...  
DLibraryDesign 18 Nov 2003 - 18:09 - NEW RjHonicky
Here is a simpler ToDo list. See ProxyStorageInterface for details on how we interface between the Proxy server and the RemoteRepository. The list of MatulyasQuestions ...  
MatulyasQuestions 18 Nov 2003 - 18:09 - NEW RjHonicky
Data Collection Prof Brewer has a trace of web-traffic from the router here at Berkeley: old We might want to build a corpus to test the library type of access: do ...  
WebPreferences 18 Nov 2003 - 08:00 - r1.2 RjHonicky
TWiki.DLibrary Web Preferences The following settings are web preferences of the TWiki.DLibrary web. These preferences overwrite the site-level preferences in TWIKIWEB ...  
TWikiGuest 18 Nov 2003 - 07:35 - NEW TWikiGuest
A guest of this TWiki web, not unlike yourself. You can leave your trace behind you, just add your name in TWIKIWEB .TWikiRegistration and create your own page. Personal ...  
TWikiUsers 14 Nov 2003 - 01:33 - r1.16 RjHonicky
List of TWiki users Please take the time and add yourself to the list. To do that fill out the form in TWIKIWEB .TWikiRegistration. This will create an account for ...  
WebIndex 24 Nov 2001 - 11:36 - r1.2 PeterThoeny
SEARCH{"\. " scope "topic" regex "on" nosearch "on"} See also the faster WebTopicList  
WebChanges 16 Aug 2001 - 19:56 - r1.2 PeterThoeny
INCLUDE{" TWIKIWEB .WebChanges"}  
WebSearch 08 Aug 2001 - 05:57 - r1.8 PeterThoeny
INCLUDE{" TWIKIWEB .WebSearch"}  

Number of topics: 22

 


WebHome  

10 Mar 2004 - 01:20 - r1.17   RjHonicky

Folks, if you've never used a wiki, learn about wiki webs on the GoodStyle page.

News:

Starting Points:

 


WebIndex  

24 Nov 2001 - 11:36 - r1.2   PeterThoeny

Topics in DLibrary web: Changed: now 13:16 GMT Changed by:
DLibraryDesign 18 Nov 2003 - 18:09 - NEW RjHonicky
Here is a simpler ToDo list. See ProxyStorageInterface for details on how we interface between the Proxy server and the RemoteRepository. The list of MatulyasQuestions ...  
DiSC 10 Mar 2004 - 01:28 - NEW RjHonicky
DiSC is the new name for the DLibrary project Main.RjHonicky 10 Mar 2004  
DocumentSources 10 Mar 2004 - 01:23 - NEW RjHonicky
Here are some starting points for online documents which might belong in a DiSC repository project guttenberg online books citeseer computer science articles looksmart ...  
DownloadedPapers 14 Dec 2003 - 02:28 - r1.3 RjHonicky
Attach papers here by clicking on the "Attach" link at the bottom  
InterestingIdeas 05 Mar 2004 - 14:09 - r1.5 RjHonicky
How do laptops join a network? Only cache own requests, but can search the entire repository They can also contribute their documents (which they supposedly got while ...  
KnownBugs 09 Dec 2003 - 08:14 - r1.5 RjHonicky
all (major) know bugs seem to have been resolved!  
MatulyasQuestions 18 Nov 2003 - 18:09 - NEW RjHonicky
Data Collection Prof Brewer has a trace of web-traffic from the router here at Berkeley: old We might want to build a corpus to test the library type of access: do ...  
ProjectDocs 03 May 2004 - 01:00 - r1.5 RjHonicky
We are currently designing experiments which utilize DLibrary. Matulya wrote a quick word document outline, which I have ported into cvs in the documentation module ...  
ProjectLinks 14 Dec 2003 - 02:18 - r1.7 RjHonicky
The sourceforge project page http://sourceforge.net/projects/dlibrary/ RabbIT: a transcoding proxy http://www.khelekore.org/rabbit/readme.shtml Jakarta Lucene, an ...  
ProxyStorageInterface 22 Nov 2003 - 19:03 - r1.2 RjHonicky
The proxy server (RabbIT) currently stores meta-data and data seperately: there is a single file which contains cache data, plus the header of each of the request ...  
RankingExperiment 07 Dec 2003 - 01:18 - NEW RjHonicky
In the RankingExperiment, the cache is first primed using a trace, or perhaps simply from usage over a period. A set of users are then given a list of search queries ...  
RemoteIO 18 Nov 2003 - 18:12 - NEW RjHonicky
RemoteOutputStreamProxy, RemoteInputStreamProxy RemoteOutputStreamProxy(Integer streamKey, URL serverUrl):RemoteOutputStreamProxy creates a Serlializable proxy object ...  
RunningDiSC 03 May 2004 - 01:18 - NEW RjHonicky
This is a page which describes for developers how to get things up and running. First check out everything: export CVS RSH ssh cvs z3 d:ext:developername@cvs.sourceforge ...  
SearchInterface 07 Dec 2003 - 00:49 - NEW RjHonicky
The search interface can be accessed by going to the following url: http://localhost:9666/FileSender/search/search cache.html You must log in to use this interface ...  
TWikiGuest 18 Nov 2003 - 07:35 - NEW TWikiGuest
A guest of this TWiki web, not unlike yourself. You can leave your trace behind you, just add your name in TWIKIWEB .TWikiRegistration and create your own page. Personal ...  
TWikiUsers 14 Nov 2003 - 01:33 - r1.16 RjHonicky
List of TWiki users Please take the time and add yourself to the list. To do that fill out the form in TWIKIWEB .TWikiRegistration. This will create an account for ...  
ToDo 01 Dec 2003 - 06:38 - r1.4 RjHonicky
See also: KnownBugs, InterestingIdeas Add in support for PDF via plugins provided by lucence (there is an interface in RemoteRepositoryImpl for adding content types ...  
WebChanges 16 Aug 2001 - 19:56 - r1.2 PeterThoeny
INCLUDE{" TWIKIWEB .WebChanges"}  
WebHome 10 Mar 2004 - 01:20 - r1.17 RjHonicky
Folks, if you've never used a wiki, learn about wiki webs on the TWiki.GoodStyle page. News: Main.RjHonicky 08 Dec 2003 : Fixed a locking problem (see KnownBugs ...  
WebIndex 24 Nov 2001 - 11:36 - r1.2 PeterThoeny
SEARCH{"\. " scope "topic" regex "on" nosearch "on"} See also the faster WebTopicList  
WebPreferences 18 Nov 2003 - 08:00 - r1.2 RjHonicky
TWiki.DLibrary Web Preferences The following settings are web preferences of the TWiki.DLibrary web. These preferences overwrite the site-level preferences in TWIKIWEB ...  
WebSearch 08 Aug 2001 - 05:57 - r1.8 PeterThoeny
INCLUDE{" TWIKIWEB .WebSearch"}  

Number of topics: 22

See also the faster WebTopicList?

 


WebPreferences  

18 Nov 2003 - 08:00 - r1.2   RjHonicky

TWiki.DLibrary Web Preferences

The following settings are web preferences of the TWiki.DLibrary web. These preferences overwrite the site-level preferences in TWikiPreferences, and can be overwritten by user preferences (your personal topic, i.e. TWikiGuest in the TWiki.Main web)

Preferences:

Notes:

Related Topics:

 


WebSearch  

08 Aug 2001 - 05:57 - r1.8   PeterThoeny

 



Number of topics: 22

Topic WebHome . { }
  Copyright © 1999-2003 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.